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Abstract 

We introduce FlowCutter, a novel algorithm to compute a set of edge cuts or node 
separators that optimize cut size and balance in the Pareto-sense. Our core algorithm 
solves the balanced connected st-edge-cut problem, where two given nodes s and t must 
be separated by removing edges to obtain two connected parts. Using the core algorithm 
we build variants that compute node separators and are independent of s and t. Using 
the Pareto-set we can identify cuts with a particularly good trade-off between cut size and 
balance that can be used to compute contraction and minimum fill-in orders, which can 
be used in Customizable Contraction Hierarchies (CCH), a speed-up technique for shortest 
path computations. Our core algorithm runs in 0{cm) time where m is the number of 
edges and c the cut size. This makes it well-suited for large graphs with small cuts, such as 
road graphs, which are our primary application. For road graphs we present an extensive 
experimental study demonstrating that FlowCutter outperforms the current state of the art 
both in terms of cut sizes as well as CCH performance. 


Partial support by DFG grant WA654/19-1 and Google Focused Research Award. 



1 Introduction 


Cutting a graph into two pieces of roughly the same size along a small cut is a fundamental 
and NP-hard |12) graph problem that has received a lot of attention laniEiEi and has many 
applications. The application motivating our research is accelerating shortest path computations 
on roads 1310 [a da [ 20 ], but in the appendix we also present bisection experiments on non¬ 
road graphs. Dijkstra’s algorithm |10j solves the shortest path problem in near-linear time. 
However, this is not fast enough if the graph consists of a whole continent’s road network. 
Acceleration algorithms exploit that road networks rarely change and compute auxiliary data 
in a preprocessing phase. This data is independent of the path’s endpoints and can therefore 
be reused for many shortest path computations. Often the auxiliary data consists of cuts. The 
basic idea is: Given a graph G and a cut C the algorithms precompute for every node how to 
get to every edge/node in C. To compute a path the algorithms first determine whether the 
endpoints are on opposite sides of C or not. If they are on opposite sides then the algorithms 
only need to assemble the precomputed paths towards C and pick the best one. If they are on 
the same side then the graph search can be pruned at C. This halves the graph that needs to 
be searched. As half a continent is still large the idea is applied recursively. The effectiveness 
of these techniques crucially depends on the size of the cuts found. Fortunately road graphs 
have small cuts because of geographical features such as rivers or mountains. Previous work has 
coined the term natural cuts for this phenomenon [7]. However, identifying these natural cuts 
is a difficult problem. Fortunately, as roads change only slowly, preprocessing running times 
are significantly less important than cut quality. One of these preprocessing-based techniques 
are Customizable Contraction Hierarchies (CCH) [9]. We demonstrate the performance of our 
algorithms using CCH. The CCH-auxiliary data is tightly coupled with tree-decompositions 
and minimum fill-in orders. Our algorithms are therefore also applicable in that domain. 

Graph partitioning software used for road graphs include KaHip Metis |15j . Inertial 
Flow |19j . or PUNCH |7]. We experimentally compare FlowCutter with the first three as we 
unfortunately have no implementation of PUNCt0 The cut problem is formalized as a bicriteria 
problem optimizing the cut size and the imbalance. The imbalance measures how much the sizes 
of both sides differ and is small if the sides are balanced. The standard approach is to bound the 
imbalance and minimize the cut size. However, this approach has some shortcomings. Consider 
a graph with a million nodes and set the max imbalance to 1%. An algorithm finds a cut 
Cl with 180 edges and 0.9% imbalance. Is this a good cut? It seems good as 180 is small 
compared to the node count. However, we would come to a different conclusion, if we knew 
that a cut C 2 with 90 edges and 1.1% imbalance existed. In our application — shortest paths 
— moving a few nodes to the other side of a cut is no problem. However, halving the cut 
size has a huge impact. The cut C 2 is thus clearly superior. Further assume that a third cut 
C 3 with 180 edges and 0.7% existed. C 3 dominates Ci in both criteria. However, both are 
equivalent with respect to the standard problem formulation and thus a tool is not required to 
output C 3 instead of Ci. To overcome these problems our approach computes a set of cuts that 
optimize cut size and imbalance in the Pareto sense. A further significant shortcoming of the 
state-of-the-art partitioners, with the exception of Inertial Flow, is that they were designed for 
small imbalances. Common benchmarks, such as |21j . only include test cases with imbalances 
up to 5%. However, for our application imbalances of 50% are fine. For such high imbalances 
unexpected things happen with the standard software, such as increasing the allowed imbalance 
can increase the achieved cut sizes. 

Contribution. We introduce FlowCutter, a graph bisection algorithm that optimizes cut size 
and imbalance in the Pareto sense. The core FlowCutter algorithm solves the balanced edge- 
st-cut graph bisection problem with connected sides. Using this core as subroutine we design 

^Further Microsoft holds a PUNCH-patent which restricts commercial applications. 
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algorithms to solve the node separator and non-st variants. Using these we design a nested 
dissection-based algorithm to compnte contraction node orders as needed by Cnstomizable Con¬ 
traction Hierarchies (CCH). These orders are also called minimnm fill-in orders or elimination 
orders and can be nsed to compnte good tree-decompositions. We prove that the core algo¬ 
rithm’s rnnning time is in 0{cm) where m is the edge connt and c the cnt size. We show in 
an extensive experimental evalnation that this is a perfect fit for road graphs that are large in 
terms of edge connt bnt small in terms of cnt size. 

Outline. We define onr terminology and introdnce related concepts in the preliminaries. The 
next section introdnces the core idea of the st-bisection algorithm. In the following section 
we describe the piercing henristic, a snbrontine needed in the core algorithm. In the section 
afterwards we describe extensions of the core algorithms: general bisection, node bisection, and 
compnting contraction orders. Finally, we present an experimental evalnation with a comparison 
against the cnrrent state of the art. In the appendix we present fnrther experiments inclnding 
experiments on non-road graphs, and a detailed rnnning time analysis. 

2 Preliminaries 

A graph is denoted by G = (U, A) with node set V and are set A. We set n := |U| and 
m := |A|. We consider nndirected, simple graphs which we interpret as symmetric directed 
graphs. Onr core algorithm also works on directed graphs which is important for the compntation 
of node separators. A eut (Vi, V 2 ) is a partition of V into two disjoint sets Vi and V 2 snch that 
U = Vi U V 2 . The size of a eut is the nnmber of arcs from Vi to V 2 - A separator (Vi, V 2 , Q) is a 
partition of V into three disjoint sets Vi, V 2 and Q snch that V = Vi U V 2 UQ. No arc connecting 
Vi and V 2 mnst exist. The cardinality of Q is the separator’s size. The imbalance e G [0, 1] of a 
cnt or separator is defined as the smallest nnmber snch that max{|Ui| , IV 2 I} < [(1 + e)n/2]. An 
ST-eut/separator is a cnt/separator between two disjoint node sets S and T snch that S' C Vi 
and T C V 2 - The expansion of a eut/separator is the cnt’s size divided by min{|Ui| , IV 2 I}. Onr 
method bnilds npon nnit flows that are compnted nsing angmenting paths mm- A node x is 
source-(target)-reachable if a non-satnrated sx-path exists with s € S (t G T). We denote by 
Sr (St) the set of all source-(target)-reachable nodes. The minimnm ST-cnt size corresponds 
to the maximnm ST-flow intensity. We define the source side cut as {Sr,V\Sr) and the target 
side cut as {Tr,V\Tr). Note that in general max-flows and min-cnts are not nniqne. However, 
the sonrce side and target side cnts are. 

Customizable Contraction Hierarchies (CCH) are an acceleration algorithm for shortest 
path compntations. We only give a high-level overview, as we nse CCH only to evalnate the 
qnality of onr cnts. No part of FlowCntter bnilds npon CCH. The details are in j9|. The central 
operation is the node contraction: Contracting a node v consists of removing v and adding edges 
between all of v’s nnconnected neighbors. The inpnt to CCH consists of a node contraction order 
along which the nodes are iteratively contracted. This yields a snpergraph G' of the inpnt graph. 
The weights of are compnted nsing an algorithm that essentially ennmerates all triangles in 
G' in the so-called customization phase. Note that contrary to the order compntation, having a 
fast customization phase is nsefnl as it allows ns to incorporate changes to the weights qnickly. 
Snch changes conld for example be cansed by traffic congestion. CCH can also be nsed if several 
weights exist on the same road graph. Having regnlar cars and trncks is an example of snch 
a sitnation. The CCH strnctnre can be shared and does not have to be replicated for each 
weight. It is snfhcient to replicate the weights. We therefore discern between memory that 
is independent of the weights and shared and memory that is needed per weight. Given the 
weights of G', the shortest path query consists of a bidirectional search in G^ only following arcs 
(x, y) snch that x appears before y in the order. The search space of a node z is the snbgraph 
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(a) Balanced cut C 


(b) Unbalanced cut C 


(c) Extra sources to avoid C 




(d) Source side cut C' (e) Target side cut C' 

Figure 1: The ellipse represents a graph and the curved lines are cuts. The “+”-signs represent 
source nodes and “x”-signs represent target nodes. 


of G' that is reachable from z while only following such arcs. Smaller search spaces yield faster 
queries. Fewer triangles in G' yield a faster customization. Less arcs in G' result in less memory 
consumption. All these quality metrics depend on the contraction order, whose quality depends 
on the cuts used in its construction. Finding these cuts is where FlowCutter fits into the big 
picture. 

Tree-Decompositions. The constructed supergraph G' is chordal, which is a graph class 
tightly coupled with tree-decomposition |^. The maximum cliques of G', which can be efficiently 
identified in chordal graphs, correspond to the bags of a tree-decomposition. A corresponding 
tree backbone can be efficiently computed. The maximum clique size in G' is thus an upper 
bound to the tree-width of the input graph. 


3 Core Algorithm 

Our algorithm works by computing a sequence of increasingly 
balanced sf-min-cuts until the imbalance drops below a given 
input parameter e. The intermediate cuts form, after remov¬ 
ing dominated ones, the computed Pareto-set. Consider the 
situation depicted in Figure Initially s is the only source 
node and t is the only target node. We start by computing an 
st-min-cut C, which is the first cut in the sequence. If we are 
lucky and G is sufficiently balanced as in Figure our algo¬ 
rithm is finished. However, most of the time we are unlucky 
and we either have the situation depicted in Figure where 
the source’s side is too small or the analogous situation where 
the target’s side is too small. Assume without loss of gener¬ 
ality that the source’s side is too small. Our algorithm now 
transforms non-source nodes into additional source nodes to 
invalidate G and computes a new more balanced st-min-cut 
G', the second cut in the sequence. To invalidate G our al¬ 
gorithm does two things: It marks all nodes on the source’s 
side of the cut as source nodes and marks one node as source 
node on the target’s side that is incident to a cut edge. This 
node on the target’s side is called the piercing node and the 
corresponding cut arc is called piercing arc. The situation is 


1 S' ^ {s}; T ■(— {f}; 

2 Sr ^S-Tr^ T- 

3 f-grow Sr-, b-grow Tr; 

4 while S' n r = 0 do 


5 

if Sr n Tr 7 ^ 0 then 

6 


augment flow; 

7 


Sr ^S-,Tr^ T; 

8 


f- 

grow S; b-grow T; 

9 

else 

10 


if IS'rI < |Tr| then 

11 



f-grow S'; 




/ / now S = Sr 

12 



output S-cut arcs; 

13 



X •(— pierce node; 

14 



S^ SU{x}-, 

15 



Sr-^ SrU {x}; 

16 



f-grow Sr-, 

17 


else 




// Same for T 




and Tr 






Figure 2: st-Bisection Algo 
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illustrated in Figure All nodes on the source’s side are 

marked to assure that C' does not cut through the source’s side. The piercing node is necessary 
to assure that C ^ C. Choosing a good piercing arc is crucial for good quality. In this section 
we assume that we have a piercing oracle that determines the piercing arc given C in time linear 
in the size of C. In Section we describe heuristics to implement such a piercing oracle. We 
want to achieve that C has a better balance than C. However, this is only true if C is a source 
side cut as in Figure Id If C' is a target side cut as in Figure le then C' might have a worse 
balance than C. Luckily, as our algorithm progresses, either the target side will catch up with 
the balance of the source side or another source side cut is found. In both cases our algorithm 
eventually hnds a cut with a better balance than C. 

Our algorithm computes the st-min-cuts by hnding max-flows and using the max-flow-min- 
cut duality m We assign unit capacities to every edge and compute the flow by successively 
searching for augmenting paths. A core observation of our algorithm is that turning nodes into 
sources or targets never invalidates the flow. It is only possible that new augmenting paths are 
created increasing the maximum flow intensity. Given a set of nodes X we say that forward 
growing (f-grow for short) X consists of adding all nodes y to X for which a node x € X and 
a non-saturated xy-path exist. Analogously backward growing X (b-grow for short) consists 
of adding all nodes y for which a non-saturated yx-path exists. The growing operations are 
implemented using a graph traversal algorithm (such as a DFS or BFS) that only follows non- 
saturated arcs. The algorithm maintains besides the flow values four node sets: the set of sources 
S, the set of targets T, the set source-reachable nodes Sr, and the set of target-reachable nodes 
Tr. Note that an augmenting path exists if and only if SrCiTr 7 ^ 0. Initially we set S = {s} and 
T = {t}. Our algorithm works in rounds. In every round it tests whether an augmenting path 
exists. If one exists the flow is augmented and Sr and Tr are recomputed. If no augmenting 
path exists then it must enlarge either S or T. This operation also yields the next cut. It then 
selects a piercing arc and grows Sr and Tr accordingly. The pseudo-code is depicted in Figure 


Running Time Overview. Assuming a piercing oracle with a running time linear in the 
current cut size, we can show that the algorithm has a running time in 0{cm) where c is the size 
of the most balanced cut found and m is the number of edges in the graph. The detailed argument 
requires a non-trivial amortized running time analysis and is in the appendix. However, the core 
argument is simple: All sets only grow unless we hnd an augmenting path. As each node can 
only be added once to each set, the running time between hnding two augmenting paths is linear. 
In total we hnd c augmenting paths. The total running time is thus in 0{cm). 


4 Pierce Heuristic 


In this section we describe how we implement the piercing oracle used in 
the previous section. Given an unbalanced arc cut C the piercing oracle 
should select a piercing arc that is not part of the hnal balanced cut in at 
most OdCI) time. Piercing the source side and target side cuts and are 
analogous and we therefore only describe the procedure for the source 
side. Denote by a = {q,p) the piercing arc with piercing node p ^ S. 

Primary Heuristic: Avoid Augmenting Paths. The hrst heuristic 
consists of avoiding augmenting paths whenever possible. Piercing an arc 
a leads to an augmenting path if and only if p G Tr, i.e., a non-saturated 
path from p to a target node exists. As our algorithm has computed Tr 
it can determine in constant time whether piercing an arc would increase 
the size of the next cut. The proposed heuristic consists of preferring 
edges with p 0 Tr if possible. It is possible that none or multiple p 0 Tr 



Figure 3: The curves 
represent cuts, the 
current one is solid. 
The arrows are cut- 
arcs, bold ones result 
in augmenting paths. 
The dashed cut is the 
next cut where pierc¬ 
ing any arc results in 
an augmenting path. 
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exist. In this case our algorithm employs a further heuristic to choose 

the piercing arc among them. However, note that the secondary heuristic is often only relevant in 
the case that none exists. Consider the situation depicted in Figurej^ Suppose for the argument 
that the target node is still far away and that the perfectly balanced cut is signihcantly larger. 
Our algorithm can choose between three piercing arcs a, b, and c. It will not pick a as this 
would increase the cut size. The question that remains is whether 6 or c is better. The answer is 
that it nearly never matters. Piercing 6 or c does not modify the flow and thus does not change 
which piercing arcs result in larger cuts. The algorithm will therefore eventually end up with 
the dashed cut independent of whether 6 or c is pierced. We know that the dashed cut has the 
same size as all cuts found between the current cut and the dashed cut. Further the dashed cut 
has the best balance among them and therefore dominates all of them. This means that most of 
the time our avoid-augmenting-paths heuristic does the right thing. However it is less effective 
when cuts approach perfect balance. The reason is that that the source and target sides meet. 
When approaching perfect balance our algorithm results in a race between source and target 
sides to claim the last nodes. Not the best side wins, but the hrst that gets there. 

Secondary Heuristic: Distance-Based. Our algorithm picks a 
piercing arc such that dist(p, t) — dist(s,p) is maximized, where s 
and t are the original source and target nodes. The dist(p, t)-term 
avoids that the source side cut and target side cut meet as nodes close 
to t are more likely to be close to the target side cut. Subtracting 
dist(s,p) is motivated by the observation that s has a high likelihood 
of being positioned far away from the balanced cuts. A piercing node 
close to s is therefore likely on the same side as s. Our algorithm 
precomputes the distances from s and t to all nodes before the core 
algorithm is run. This allows it to evaluate dist(p, t) — dist(s,p) in 
constant time inside the piercing oracle. The distance heuristic has 
a geometric interpretation as depicted in Figure]^ We interpret the 
distance as euclidean distance. If s and t are points in the plane then the set of points p for 
which ||p — s ||2 — ||p — t ||2 = c holds for some constant c is one branch of a hyperbola. The hgure 
depicts the branches for c = 1.3 and c = 0.7. The heuristic prefers piecing nodes on the c = 1.3- 
branch as it maximizes c. A consequence of this is that the heuristic works well if the desired 
cut follows roughly a line perpendicular to the line through s and t. This heuristic works on 
many graphs but there are instances where it breaks down such as cuts that follow a circle-like 
shape. Note that this geometric interpretation also works in higher-dimensional spaces. 

5 Extensions 

General Cuts. Our core algorithm computes balanced st-cuts. However, in many situations 
the overall smallest balanced cut is required. This problem variant can be solved with high 
probability by running FlowCutter multiple times with si-pairs picked uniformly at random. 
Indeed, suppose that C is an optimal cut such that the larger side has an nodes (i.e. a = 
(e-|- l)/2) and q is the number of si-pairs. The probability that C separates a random si-pair is 
2a(l — a). The success probability over all q si-pairs is thus 1 — (1 — 2q:( 1 — a))"^. For e = 33% 
and q = 20 the success probability is 99.99%. For larger a this rate decreases. However, it is 
still large enough for all practical purposes, as for a = 0.9 (i.e. e = 80%) and q = 20 the rate 
still is 98.11%. The number of si-pairs needed does not depend on the size of the graph nor on 
the cut size. If the instances are run one after another then the running time depends on the 
worst cut’s size which may be more than c. We therefore run the instances simultaneously and 
stop once one instance has found a cut of size c. The running time is thus in 0{cm). 

Note that this argumentation relies on the assumption that it is enough to hnd an si-pair 



Figure 4: Geometric in¬ 
terpretation of the dis¬ 
tance heuristic. 
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that is separated. However, in practice the positions of s and t in their respective sides influences 
the performance of our piercing heuristic. As a result it is possible that in practice more st-pairs 
are needed than predicted by theory. 

Node Separators. To compute contraction 
orders node separators are needed and not 
edge cuts. To achieve this we employ a stan¬ 
dard construction to model node capacities in 
flow problems |Tj. We transform the symmet¬ 
ric input graph G = {V, A) into a directed 
expanded graph G' = {V',A') and compute 
flows on G'. We expand G into G' as follows: 

For each node x € V there are two nodes xt 
and Xo in V'. We refer to Xi as the in-node and to Xq as the out-node of x. There is an internal 
arc {xi,Xo) € A' for every node x G V. We further add for every arc (x,y) G A an external 
arc {xoiVi) to A'. The construction is illustrated in Figure]^ For a source-target pair s and t 
in G we run the core algorithm with source node So and target node ti in G\ The algorithm 
computes a sequence of cuts in Gb Each of the cut arcs in G' corresponds to a separator node 
or a cut edge in G depending on whether the arc in G' is internal or external. From this mixed 
cut our algorithm derives a node separator by choosing for every cut edge in G the endpoint on 
the larger side. Unfortunately using this construction, it is possible that the graph is separated 
into more than two components, i.e., we can no longer guarantee that both sides are connected. 

Contraction Orders. Using a nested dissection [16] variant our algorithm constructs con¬ 
traction orders. It bisects G along a node separator Q into subgraphs Gi and G 2 . It recursively 
computes orders for Gi and G 2 . The order of G is the order of Gi followed by the order of G 2 
followed by the nodes in Q in an arbitrary order. Selecting Q is non-trivial. After some exper¬ 
imentation we went with the following heuristic: Pick the separator with minimum expansion 
and at most 60% imbalance. As base case for the recursion we use trees and cliques. On cliques 
any order is optimal and on trees an optimal order can be derived from an optimal node ranking, 
which can be computed in linear time |18j . 

Road graphs have many nodes of degree 1 or 2. We exploit this in a fast preprocessing 
step to significantly reduce the graph size. Our algorithm determines the largest biconnected 
component B using |14| in linear time. It then removes all edges from G that leave B. It continues 
independently on every connected component of G. The resulting orders are concatenated. The 
order of B must be last. The other orders can be concatenated in arbitrary way. For each 
connected component our algorithm identifies the degree-2-chains. For a chain x,yi .. .yk, z it 
removes all yi and adds an edge from x to z unless x or z have degree 1. The yi nodes and x or z 
if they have degree 1 are positioned at the front of the order. Their relative order is determined 
using the optimal tree ordering algorithm. All remaining nodes are ordered behind them. After 
eliminating degree-2-chains our algorithm uses the nested dissection algorithm described above. 

6 Experiments 

We compare Flowcutter to the state-of-the-art partitioners KaHip, Metis, and InertialFlow. We 
present three experiments: (1) we compare the produced contraction orders in terms of CCH 
performance, (2) compare the Pareto-cut-sets, and (3) evaluate FlowCutter on non-road graphs 
using the Walshaw benchmark set. The last experiment is in Appendix]^ and can be summarized 
as follows: For e = 5% there are only 6 out of 24 graphs where FlowCutter does not match the 
best known solutions. For 3 of them FlowCutter is off by at most 5 edges. All experiments were 
run on a Xeon E5-1630 v3 @ 3.70GHz with 128GB DDR4-2133 RAM. 



Figure 5: Expansion of an undirected graph G 
into a directed graph G'. The dotted arrows are 
internal arcs. The solid arrows are external arcs. 
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6.1 Order Experiments 


We compute contraction orders for 4 DIMACS roads graphs |8]. The smallest is Colorado with 
n = 436K and m = IM. Next is California and Nevada with n = 1.9M and m = 4.6M, followed 
by (Western) Europe with n = 18M and m = 44M and finally a graph encompassing the whole 
USA with n = 24M and m = 57M. 

We use FlowCutter with all extensions in two variants denoted by F20 and F3, with 20 
respectively 3 random source-target-pairs. We use the ndmetis tool of Metis 5.1.0 with the 
default parameters and refer to it as M. Unfortunately KaHip[^ and InertialFlow do not provide 
order computation tools. We therefore implemented basic nested dissection ordering algorithms 
on top of them. The KaHip implementation was already used in |9] and is the current state of 
the art in terms of order quality. We refer to it as K. The tool iteratively computes cuts using 
KaHip-strong 0.61 using different random seeds until the cut size does not decrease for 10 rounds. 
We set e = 20% for KaHip. This value is comparatively small, but KaHip has problems with 
large e as demonstrated in the Pareto-cut experiments in Section 6.2 Note that this setup solely 
optimizes order quality disregarding order computation times, which therefore can certainly be 
improved. We report the corresponding running times therefore as upper bounds. Note that 
we argue that FlowCutter is superior mostly because of the achieved order quality, not because 
it is particularly fast. Not having well-tuned KaHip running times is therefore not problematic 
for our comparison. We reimplemented InertialFlow and were able to reproduce the cuts and 
running times of the original publication with our implementation. It is not randomized and 
therefore computing several cuts with different random seeds per graph as for KaHip is not 
useful. As consequence the reported running times adequately represent the performance of a 
basic nested dissection algorithm combined with Inertial Flow. InertialFlow is denoted by I and 
we set e = 60%. Both KaHip and InertialFlow compute edge cuts. We turn them into node 
separators by choosing the endpoints of the cut edges on the larger side. 


Results. Our results are summarized in TableWe observe that, modulo small cache effects, 
the customization time is correlated with the number of triangles and the average query running 
time is correlated with the number of arcs in the CCH. The memory needed per weight are 
correlated with the number of arcs in the CCH. The CCH-structure memory consumption is 
dominated by the list of precomputed triangles and thus the amount of necessary memory 
is correlated with the number of triangles. All these correlation are non-surprising and were 
predicted by CCH theory. Denote by Ug and mg the number of nodes and arcs in the search 
space. For the average numbers we observe that 1.7 < < 2.6 and for the maximum 

numbers we observe that 2.1 < /rUg <3.9, which indicates that the search spaces are 

nearly complete graphs. The number of nodes and the number of arcs are thus related. We can 
thus say that search space is small or large without indicating whether we refer to nodes or arcs. 

FlowCutter produces the smallest search spaces. Using more source-target pairs results in 
better orders, but already 3 give a decent order. Inertial Flow is dominated by KaHip with 
the exception of the USA graph. Metis is last by a significant margin on all but the smallest 
graph. The ratio between the average and the maximum size is very interesting. A high ratio 
indicates that a partitioner often finds good cuts, but at least one cut is comparatively bad. This 
ratio is never close to 1, indicating that road graphs are not perfectly homogeneous. In some 
regions, probably cities, the cuts are worse than in some other regions, probably the country¬ 
side. Compared to the competitors, the ratio is however higher for InertialFlow. This illustrates 
that its geography-based heuristic is effective most of the time but not always. 

A small search size is not equivalent with the CCH containing only few arcs. It is possible 
that vertices are shared between many search spaces and thus the CCH can be significantly 
smaller than the sum of the search space sizes. This effect occurs and explains why the number 

^Some preliminary work was done in |22| . 
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6.5 

36.4 

180 

9.9 

88 

60 

50 

335 


K 

187.7 

483 

7.0 

37 

7.5 

34.2 

160 

<18 659.3 

90 

30 

57 

326 

d 

o 

I 

191.4 

605 

7.1 

53 

6.9 

34.1 

161 

42.6 

84 

31 

52 

320 

F3 

177.5 

356 

6.2 

24 

5.9 

23.4 

127 

64.1 

69 

27 

45 

231 


F20 

170.0 

380 

5.6 

26 

5.8 

21.8 

132 

386.8 

66 

26 

44 

218 


M 

1223.4 

1983 

441.4 

933 

69.9 

1390.4 

926 

125.9 

2 242 

1162 

533 

11210 


K 

638.6 

1224 

114.3 

284 

73.9 

578.2 

482 

<213 091.1 

975 

304 

564 

5 044 


I 

732.9 

1569 

149.7 

414 

67.4 

589.7 

516 

1017.2 

932 

385 

514 

5 082 

F3 

734.1 

1159 

140.2 

312 

60.3 

519.4 

531 

2 532.7 

853 

366 

460 

4491 


F20 

616.0 

1102 

102.8 

268 

58.8 

459.6 

455 

16 841.5 

780 

271 

449 

4024 


M 

990.9 

1685 

249.1 

633 

86.0 

1241.1 

676 

170.8 

2 084 

651 

656 

10 217 

d 

K 

575.5 

1041 

71.3 

185 

97.9 

737.1 

366 

<265 567.3 

1250 

202 

747 

6462 

CO 

I 

533.6 

1371 

62.0 

291 

88.8 

682.0 

384 

1076.8 

1122 

177 

677 

5 972 


F3 

562.7 

906 

66.4 

159 

75.9 

478.4 

321 

2117.7 

856 

190 

579 

4320 


F20 

490.6 

868 

52.7 

154 

74.3 

440.5 

312 

12 379.2 

811 

156 

567 

4019 


Table 1: Contraction Order Experiments. We report the average and maximum over all nodes v 
of the number of nodes and arcs in the CCH-search space of v, the number of arcs and triangles in 
the CCH, and the induced upper treewidth bound. We additionally report the order computation 
times, the customizatiorj^ times, and the average shortest path distance query times. Only the 
customization times are parallelized using 4 cores. The customization times are the median over 
9 runs to eliminate running variance. The query running times are averaged over 10® st-queries 
with s and t picked uniformly at random. Finally, we report the memory needed per directed 
32bit weight, including the input graph weights, and for the weight-independent CCH structure. 


of arcs in CCH is orders of magnitude smaller than the sum over the arcs in all search spaces. 
Further, minimizing the number of arcs in the CCH is not necessarily the same as minimizing 
the search space sizes. This explains why Metis beats KaHip in terms of CCH size but not 
in terms of search space size. InertialFlow seems to be comparable to Metis in terms of CCH 
size, as the CCH arc count is sometimes slightly below and sometimes slightly larger. However, 
FlowCutter beats all competitors and clearly achieves the smallest CCH sizes. 

A third important order quality metric is the number of triangles in the CCH. Metis is 
competitive on the two smaller graphs, but is clearly dominated on the continental sized graphs. 
InertialFlow and KaHip seem to be very similar, with the exception of the USA graph where 
InertialFlow comes out slightly ahead. FlowCutter also wins with respect to this quality metric 
producing between 20% and 30% less triangles compared to the closest competitor. 

As the CCH is essentially a chordal graph which are closely tied to tree decomposition, we 
obtain upper bounds on the tree width of the input graphs as a side product. This quality 
metric is not directly related to CCH performance, but is of course indirectly related as most 
of the other criteria can be bounded in terms of it. As such it reflects the same trend: Metis is 
worst, followed by InertialFlow, followed by KaHip, and FlowCutter with the best bounds. 

^Several CCH customization variants exist. Ours is non-amortized, non-perfect, with SSE and uses precomuted 
triangles. The CCH structure space consumption includes the precomputed triangles. 
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max e 

[%i 


Achieved e [%] 



Cut Size 


Running Time [s] 


F20 

K 

M 

I 

F20 

K 

M 

I 

F20 

K 

M 

I 

0 

0.000 

0.000 

0.000 

0.000 

39 

157 

51 

306 

59.8 

30.8 

0.8 

1.1 

1 

0.169 

0.184 

0.000 

0.566 

31 

31 

52 

93 

53.2 

14.6 

0.8 

1.4 

3 

2.293 

2.300 

0.001 

1.112 

29 

29 

61 

64 

51.0 

24.0 

0.8 

1.7 

5 

2.293 

2.293 

0.005 

1.571 

29 

29 

42 

62 

51.0 

36.4 

0.8 

2.3 

10 

2.293 

2.304 

0.001 

0.642 

29 

29 

43 

37 

51.0 

76.2 

0.8 

2.2 

20 

16.706 

2.756 

0.000 

2.656 

28 

30 

41 

29 

49.6 

15.0 

0.9 

2.4 

30 

16.706 

2.768 

13.936 

5.484 

28 

29 

51 

29 

49.6 

15.5 

0.8 

2.9 

50 

49.058 

2.768 

0.000 

40.833 

24 

29 

39 

27 

43.2 

15.5 

0.8 

3.7 

70 

49.058 

2.768 

41.178 

42.591 

24 

29 

4310 

26 

43.2 

15.4 

0.8 

4.9 

90 

89.838 

2.768 

47.370 

85.555 

14 

29 

3711 

18 

25.4 

15.6 

0.9 

5.2 

(a) California and Nevada 

max e 


Achieved e [%] 



Cut Size 


Running Time [s] 


[%i 

F20 

K 

M 

I 

F20 

K 

M 

I 

F20 

K 

M 

I 

0 

0.000 

0.000 

0.000 

0.000 

240 

716 

369 

1180 

1390.3 

369.1 

3.3 

4.3 

1 

0.132 

0.998 

0.000 

0.089 

220 

245 

360 

391 

1342.9 

80.2 

3.3 

7.9 

3 

0.132 

0.457 

0.000 

0.008 

220 

227 

372 

319 

1342.9 

112.5 

3.1 

10.2 

5 

4.894 

0.464 

0.000 

0.857 

213 

227 

369 

276 

1319.0 

158.3 

3.3 

12.3 

10 

9.330 

0.043 

0.000 

0.375 

180 

228 

375 

241 

1181.5 

338.1 

3.1 

16.8 

20 

10.542 

3.139 

0.000 

0.132 

162 

250 

375 

220 

1089.5 

75.5 

3.1 

25.6 

30 

10.542 

3.139 

0.017 

7.384 

162 

250 

369 

203 

1089.5 

75.4 

3.1 

34.9 

50 

44.386 

3.139 

33.336 

10.542 

155 

250 

9 881 

162 

1047.8 

75.3 

3.2 

47.5 

70 

66.655 

3.139 

41.178 

44.386 

86 

250 

14375 

155 

591.6 

75.5 

3.2 

82.8 

90 

84.199 

3.139 

83.087 

84.257 

13 

250 

28 

17 

92.8 

75.4 

3.3 

17.1 


(b) Central Europe 


Table 2: Pareto-Set Experiments. We report the balance, the cut size and the computation time 
for various partitioners and allowed maximum imbalance. For FlowCutter the computation time 
includes the time needed to compute all less balanced cuts in the Pareto cut set. 


Quality comes at a price and thus the computation times of the orders follow the opposite 
trend: FlowCutter is the slowest, followed by InertialFlow, while Metis is astonishingly fast. 
Where KaHip hts into the picture is unclear, as the nested dissection implementation employed 
is not tuned for computation speed and only for order quality. However, the times in the next 
experiment suggested that a well-tuned implementation is between FlowCutter and InertialFlow. 

6.2 Pareto Cut Set Experiments 

In the previous experiment we have demonstrated that FlowCutter produces the best contraction 
orders. In this section we look at the Parteo-cut sets of two graph in more detail. Selecting 
meaningful and representative testing instances is difficult. The cuts of the USA graph are 
dominated by the cut induced by the Mississippi, as is demonstrated in Appendix |C] The Europe 
graph is problematic as the top level cuts behave differently from nearly all lower level cuts. On 
the top level there are many comparatively weakly connected peninsulas. This structure is very 
rare on the lower levels. This leads to a special behavior that we discuss in detail in Appendix [B] 
which can be summarized as following: Cutting the peninsulas leads to a smaller cut but only 
delays the inevitable cut through central Europe in a recursive setup. Cutting the peninsulas 
thus looks clearly better, even though it is not clearly superior when considering a recursive 
partitioning. We therefore run experiments on a subgraph of the Europe with a latitude G [45, 52] 
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and longitude G [—2,11] that encompasses most of Cen¬ 
tral Europe, i.e., with all the peninsulas cut of. We ad¬ 
ditionally pick the DIMACS California&Nevada graph 
because |5] determined an optimal cut of this graph for 
e = 0. The Appendix additionally contains num¬ 
bers for the Colorado graph. We compare KaHip 0.71, 

Metis 5.1.0, InertialFlow and FlowCutter-20 in terms 
of edge cut sizes. The first three compute a single cut, 
whereas FlowCutter computes a Pareto-set, such as the 
one illustrated in Figure We therefore run the first three for various choices of e. We use 
KaHip-strong with --enf orce_balance for e = 0. All other parameters have default values. 

Results. Table summarizes our results. Metis produces extremely bad cuts for imbalances 
above 70%. Strangely KaHip has problems with perfect balance. This is unexpected as KaHip 
was optimized for perfect balance HZ]. This is most likely the result of the default parameters 
not being optimized for road graphs. KaHip and Metis mostly ignore the allowed imbalance. 
The maximum achieved imbalance of KaHip is 3.2% even though 90% is allowed. Metis is nearly 
always well below 1%. Interestingly increasing the allowed imbalance can increase the achieved 
cut sizes. We conclude that computing a full Pareto-cut-set for a road graph is not possible in 
the straight-forward way with KaHip or Metis. 

InertialFlow is bad at finding highly balanced cuts. Fortunately, for higher values of e 
competitive cuts are found. This explains why the computed contraction orders are competitive. 
A significant advantage of InertialFlow compared to Metis and KaHip is that a higher maximum 
imbalance cannot increase the cut size. Unfortunately, InertialFlow has its own set of problems. 
It does not find the best cut just below the allowed maximum imbalance. For example the good 
cut through Europe with e = 10.542% is not found when allowing a maximum imbalance of 
30%. A maximum imbalance of 50% is necessary, i.e., the choice of 30% vs 50% determines 
whether a 10.5% cut is found or not. Unfortunately, a higher maximum imbalance is not always 
better. Consider the two cuts with 29 edges on California. They differ in the achieved balance, 
i.e., two cuts with the same size but a different balance exist. InertialFlow does not find the 
variant with the better balance, if the maximum allowed imbalance is too high. Further, it fails 
to find the 29 edge cut with the best balance which is only found by KaHip and FlowCutter. 
Unfortunately, also KaHip can not find it reliably, as it finds 4 different cuts with 29 edges 
and varying balances. Only FlowCutter reliably finds the variant with the best balance. In |5] 
an optimal California cut for e = 0% with 32 edges was computed. All tested algorithms are 
therefore suboptimal as the best one finds a cut with 39 edges. However, even a slight imbalance 
of 1% is enough for FlowCutter and KaHip to find cuts with 31 edges. The achieved 1% cuts 
can therefore be optimal. Metis is the fastest, followed by InertialFlow, followed by KaHip. 
Positioning FlowCutter in this list is difficult, as it (a) is the only one to compute Pareto cut 
set, enabling plots such as those in Figure]^ and (b) even if one is only interested in a single 
cut, it honors the maximum imbalance parameter much better. 

7 Conclusion and Future Research 

We introduced FlowCutter, a bisection algorithm that optimizes balance and cut size in the 
Pareto sense. We used it to compute contraction orders (also called elimination or minimum 
fill-in orders) and have shown that it beats the state of the art in terms of quality on road graphs. 

FlowCutter needs two initial nodes on separate sides of the cut. Currently these are deter¬ 
mined by random sampling. A better selection strategy could decrease the number of samples 
needed. Further investigating other piercing heuristics could be beneficial. 

Acknowledgment: We thank Roland Glantz for helpful discussions. 
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Figure 6: F20 Pareto cuts, C. Europe. 
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A Walshaw Benchmark Set 


A popular set of graph partitioning benchmark instances is maintained by Walshaw )21j . The 
data contains 34 graphs and solutions to the edge-bisection problem with non-connected sides 
and maximum imbalance values of e = 0%, e = 1%, e = 3%, and e = 5%. These archived 
solutions are the best cuts that any partitioner has found so far. A few of them were even 
proven to be optimal [5]. Comparing against these archived solutions allows us to compare 
FlowCutter quality-wise against the state of the art. We want to stress that this state of the 
art was computed by a large mixture of algorithms with an even larger set of parameters that 
may have been chosen in instance-dependent ways. We compare this against a single algorithm 
with a single set of parameters. Further FlowCutter was designed for higher imbalances than 
5%. It was not tuned for the cases with a lower imbalance. FlowCutter only computes cuts with 
connected sides. We therefore hlter out all graphs that are either not connected or where the 
archived e = 0-solution has non-connected sides. Of the 34 graphs only 24 remain. The results 
are reported in Tables and 

For e = 5% there are only 6 graphs where FlowCutter does not match the best known cut 
quality. These are: “144”, “cs4”, “ml4b”, “wave”, “wing”, and “wing_nodar’. For three of these 
graphs FlowCutter hnds cuts that are larger by a negligible amount of at most 5 edges. For the 
other three the cuts found are larger but are still close to the best known solutions. For lower 
imbalances the results are not quite as good but still very close to the best known solutions. 

In terms of running time the results are more mixed. Some cuts are found very quickly, while 
FlowCutter needs a signihcant amount of time on others. This is due to the fact that its running 
time is in 0{cm). If both the cut size c and the edge count m are large then this running time is 
high. However, for graphs with small cuts the algorithm scales nearly linearly in the graph size. 
Note that FlowCutter does not only compute the highly balanced cuts reported in the table. 
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minimum edges 

in cut for 


running 

graph 

algorithm 

e = 0% 

e = 1% e 

CO 

II 

= 5% 

time [s] 

144 

FlowCutter 20 

6 649 

6 608 

6514 

6472 

2 423.82 

144K nodes 

FlowCutter 100 

6515 

6 479 

6456 

6 366 

10 437.91 

1074K edges 

Reference 

6486 

6 478 

6432 

6 345 


3elt 

FlowCutter 20 

90 

89 

87 

87 

0.36 

4720 nodes 

FlowCutter 100 

90 

89 

87 

87 

1.87 

13K edges 

Reference 

90 

89 

87 

87 


4elt 

FlowCutter 20 

149 

138 

137 

137 

1.97 

15K nodes 

FlowCutter 100 

139 

138 

137 

137 

9.50 

45K edges 

Reference 

139 

138 

137 

137 


598a 

FlowCutter 20 

2417 

2 390 

2 367 

2 336 

545.69 

11 OK nodes 

FlowCutter 100 

2 400 

2 388 

2 367 

2 336 

2 675.32 

741K edges 

Reference 

2 398 

2 388 

2 367 

2 336 


auto 

FlowCutter 20 

10 609 

10 283 

9 890 

9450 

13 445.66 

448K nodes 

FlowCutter 100 

10 549 

10 283 

9 823 

9450 

66 249.82 

3314K edges 

Reference 

10103 

9 949 

9 673 

9450 


bcsstk30 

FlowCutter 20 

6454 

6 347 

6251 

6 251 

245.65 

28K nodes 

FlowCutter 100 

6408 

6 347 

6 251 

6 251 

1230.27 

1007K edges 

Reference 

6 394 

6 335 

6 251 

6 251 


bcsstk33 

FlowCutter 20 

10 220 

10 097 

10 064 

9 914 

118.38 

8738 nodes 

FlowCutter 100 

10177 

10 097 

10 064 

9 914 

573.02 

291K edges 

Reference 

10171 

10 097 

10 064 

9 914 


brack2 

FlowCutter 20 

742 

708 

684 

660 

58.13 

62K nodes 

FlowCutter 100 

742 

708 

684 

660 

283.99 

366K edges 

Reference 

731 

708 

684 

660 


crack 

FlowCutter 20 

184 

183 

182 

182 

2.17 

lOK nodes 

FlowCutter 100 

184 

183 

182 

182 

10.97 

30K edges 

Reference 

184 

183 

182 

182 


cs4 

FlowCutter 20 

381 

371 

367 

360 

11.68 

22K nodes 

FlowCutter 100 

372 

370 

365 

357 

58.11 

43K edges 

Reference 

369 

366 

360 

353 


cti 

FlowCutter 20 

342 

318 

318 

318 

6.10 

16K nodes 

FlowCutter 100 

339 

318 

318 

318 

30.55 

48K edges 

Reference 

334 

318 

318 

318 


fe 4elt2 

FlowCutter 20 

130 

130 

130 

130 

1.86 

IIK nodes 

FlowCutter 100 

130 

130 

130 

130 

9.19 

32K edges 

Reference 

130 

130 

130 

130 



Table 3: Performance on the Walshaw benchmark set, Part 1. “Reference” is the best known 
bisection for the graph as maintained by Walshaw. “FlowCutter 20” uses 20 random st-pairs 
and “FlowCutter 100” uses 100 random st-pairs. 
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minimum edges 

in cut for 


running 

graph 

algorithm 

fe? 

o 

II 

e = 1% e 

= 3% 

e = 5% 

time [s] 

fe ocean 

FlowCutter 20 

504 

431 

311 

311 

89.70 

143K nodes 

FlowCutter 100 

483 

408 

311 

311 

418.60 

409K edges 

Reference 

464 

387 

311 

311 


fe rotor 

FlowCutter 20 

2115 

2 091 

1959 

1948 

334.58 

99K nodes 

FlowCutter 100 

2106 

2 067 

1959 

1940 

1 636.78 

662K edges 

Reference 

2 098 

2 031 

1959 

1940 


fe sphere 

FlowCutter 20 

386 

386 

384 

384 

5.98 

16K nodes 

FlowCutter 100 

386 

386 

384 

384 

30.84 

49K edges 

Reference 

386 

386 

384 

384 


o 

o 

1 

FlowCutter 20 

3 852 

3 841 

3 814 

3 773 

413.48 

78K nodes 

FlowCutter 100 

3 836 

3 832 

3 790 

3 773 

2 067.54 

452K edges 

Reference 

3 816 

3 814 

3 788 

3 773 


finan512 

FlowCutter 20 

162 

162 

162 

162 

8.11 

74K nodes 

FlowCutter 100 

162 

162 

162 

162 

39.01 

261K edges 

Reference 

162 

162 

162 

162 


ml4b 

FlowCutter 20 

3 858 

3 826 

3 823 

3 805 

2115.07 

214K nodes 

FlowCutter 100 

3 836 

3 826 

3 823 

3 804 

10 512.24 

1679K edges 

Reference 

3 836 

3 826 

3 823 

3 802 


took 

FlowCutter 20 

80 

79 

73 

65 

2.98 

60K nodes 

FlowCutter 100 

80 

77 

71 

65 

14.55 

89K edges 

Reference 

79 

75 

71 

65 


vibrobox 

FlowCutter 20 

10 614 

10 356 

10 356 

10 356 

139.90 

12K nodes 

FlowCutter 100 

10 365 

10 310 

10 310 

10310 

680.76 

165K edges 

Reference 

10 343 

10 310 

10 310 

10310 


wave 

FlowCutter 20 

8 734 

8 734 

8 734 

8 724 

2 723.12 

156K nodes 

FlowCutter 100 

8 716 

8 673 

8 650 

8 590 

13 583.59 

1059K edges 

Reference 

8 677 

8 657 

8 591 

8 524 


whitaker3 

FlowCutter 20 

127 

126 

126 

126 

1.49 

9800 nodes 

FlowCutter 100 

127 

126 

126 

126 

7.00 

28K edges 

Reference 

127 

126 

126 

126 


wing 

FlowCutter 20 

790 

790 

790 

790 

80.11 

62K nodes 

FlowCutter 100 

790 

790 

781 

773 

401.82 

12IK edges 

Reference 

789 

784 

773 

770 


wing nodal 

FlowCutter 20 

1767 

1764 

1715 

1691 

27.02 

lOK nodes 

FlowCutter 100 

1743 

1740 

1710 

1688 

134.05 

75 K edges 

Reference 

1707 

1695 

1678 

1668 



Table 4: Performance on the Walshaw benchmark set, Part 2. 
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max e 

[%] 

Achieved e [%] 



Cut Size 


Running Time [s 

1 

F20 

K 

M 

I 

F20 

K 

M 

I 

F20 

K 

M 

I 

0 

0.000 

0.000 

0.003 

0.000 

276 

1296 

402 

1579 

3 475.5 1887.5 

8.9 

11.3 

1 

0.930 

1.000 

0.003 

0.337 

234 

169 

398 

417 

3 292.7 

224.7 

8.9 

19.0 

3 

2.244 

2.717 

0.003 

0.357 

221 

130 

306 

340 

3215.0 

317.5 

8.9 

28.0 

5 

4.918 

2.976 

0.003 

0.171 

216 

129 

276 

299 

3181.7 

510.2 

8.9 

33.6 

10 

9.453 

8.092 

0.003 

0.174 

188 

112 

460 

284 

2 913.3 

934.0 

9.0 

49.3 

20 

9.453 

9.405 

0.003 

7.539 

188 

126 

483 

229 

2 913.3 

198.8 

8.9 

69.9 

30 

9.453 

9.232 

0.003 

9.060 

188 

128 

465 

202 

2 913.3 

193.6 

8.9 

94.0 

50 

42.080 

9.232 

33.336 

9.453 

58 

128 

31127 

188 

949.4 

194.1 

9.1 

172.9 

70 

67.497 

9.232 

41.178 64.724 

22 

128 

53 365 

38 

371.9 

193.9 

9.4 

79.0 

90 

72.753 

9.232 

70.741 72.753 

2 

128 

44 

2 

51.9 

194.1 

9.0 

18.9 


Table 5: Results for the DIMACS Europe graph with 18M nodes and 22M edges. The KaHip 
cut with 169 edges does not have connected sides. 





(a) K, Sat.-Cut (b) F, Sat.-Cut (c) F, Rhine-Cut 

Figure 7: Various good cuts found. ”K” is a cut found with KaHip. “F” was found with 
FlowCutter. 


B Europe Graph 

We performed bisection experiments on the DIMACS Europe graph. The results are presented in 
Table 1^ The KaHip 112-edge cut is illustrated in Figure [7^ and the 188-edge cut of FlowCutter 
is depicted in Figure The reason is that our piercing heuristic searches for cuts that are 
roughly perpendicular to a line whereas the cut found by KaHip is roughly a circle. This is due 
to Europe’s very special topology. It consists of a well connected center consisting of France, 
Germany, Belgium, Luxembourg, the Netherlands and Denmark. This center is surrounded by 
4 satellites. These are: Great Britain, Spain and Portugal, Italy, Norway and Sweden. It is 
not clear to which part Austria and Switzerland belong. These satellites can be very loosely 
connected to the center. For example Scandinavia is only connected using 2 edges with the 
rest. These two edges are the two highway sides of a bridge in Copenhagen. This is the 2-edge 
cut with 73% imbalance found by FlowCutter and InertialFlow. Apparently the ferries to and 
from Scandinavia are missing in the DIMACS Europe graph and Scandinavia contains about 
14% of all nodes. A minimum balanced cut consists of separating the center from its satellites. 
KaHip finds one of these satellite-cuts. FlowCutter with the distance piercing heuristic does 
not. FlowCutter finds a Rhine-cut through central Europe. It goes mostly along the Rhine, and 
then goes along the border between Italy and Austria. 

At the first glance it seems as if KaHip wins on this instance. However, in nested dissection 
context satellite-cuts are not necessarily beneficial. Choosing a satellite-cut at the top level 
only delays the inevitable Rhine-cut by one level in the separator tree. Theory [9] predicts that 
picking a small balanced cut C at the top level is good when the cuts in both resulting sides 
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are significantly smaller than C. However, the top levels of the Europe graph does not have this 
structure. A second level Rhine-cut is significantly larger than a top level satellite-cut. Further 
the union of the satellites are only very loosely connected. This is a huge contrast to the large 
Rhine-cut needed for the other side. The satellite-cuts are thus highly imbalanced in this sense. 
This is the reason why KaHip’s finding a smaller top level cut does not contradict FlowCutter 
producing better contraction orders. 

A question that arises is whether the cut found by 
KaHip is the best cut separating the center from the 
satellites. To investigate this question we run FlowCut¬ 
ter with handpicked multiple source and target nodes. 

We pick a source node in the middle of Europe and a tar¬ 
get node in each of the 4 satellites. We pick the closest 
nodes to the coordinates reported in Table FlowCut¬ 
ter does not find the 112-edge cut with 8% imbalance 
found by KaHip. It does however find a probably supe¬ 
rior satellite-cut with 87 edges and 15% imbalance that 
KaHip misses. This cut is illustrated in Figure TE 


Lat Lon Place 

Source 49.0 8.4 Karlsruhe 

41.0 16.9 Bari 
38.7 -9.1 Lisbon 
53.5 -2.8 Liverpool 
59.2 18.0 Stockholm 


Target 


The 


Table 6: Handpicked source and target 
nodes. 


main difference is to which side Austria belongs. Also 

the cut through Switzerland and the cut at the France-Spanish border differs slightly. 
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max e 

Achieved e [%] 



Cut Size 


Running Time | 

s] 

[%] 

F20 

K 

M 

I 

F20 

K 

M 

I 

F20 K 

M 

I 

0 

0.000 

0.000 

0.001 

0.000 

37 

74 

40 

259 

12.1 4.5 

0.1 

0.2 

1 

0.277 

0.970 

0.002 

0.088 

29 

34 

39 

96 

9.9 2.8 

0.2 

0.3 

3 

0.277 

2.999 

0.000 

0.748 

29 

29 

51 

70 

9.9 4.0 

0.2 

0.3 

5 

4.263 

4.290 

0.025 

0.897 

28 

27 

40 

60 

9.6 5.1 

0.1 

0.3 

10 

9.073 

9.467 

0.001 

1.413 

23 

23 

47 

46 

8.1 9.1 

0.2 

0.3 

20 

19.995 

11.761 

16.671 

13.984 

19 

22 

376 

27 

6.8 3.2 

0.2 

0.3 

30 

27.606 

12.249 

23.080 

23.125 

14 

20 

521 

21 

5.2 3.0 

0.2 

0.4 

50 

40.630 

9.772 

42.409 

36.365 

12 

23 

14 

14 

4.5 3.4 

0.1 

0.4 

70 

57.602 

12.000 

41.177 

48.771 

11 

23 

1124 

12 

4.2 3.5 

0.2 

0.5 

90 

87.330 

12.084 

47.362 

81.495 

8 

20 

856 

9 

3.1 3.5 

0.2 

0.6 

Table 7: 

Results for the DIMACS Colorado graph with 436K nodes and 521K edges. 

In [3 it 

was shown that an optimal perfectly balanced cut has 29 edges. 




max e 

Achieved e [%] 



Cut Size 


Running Time | 

s] 

[%] 

F20 

K 

M 

I 

F20 

K 

M 

I 

F20 K 

M 

I 

0 

0.000 

0.000 

0.001 

0.000 

119 

1342 

245 

1579 

1902.0 2 489.1 

12.2 

15.7 

1 

0.594 

0.545 

0.000 

0.388 

86 

109 

216 

406 

1717.6 274.7 

12.1 

23.6 

3 

2.333 

2.334 

0.001 

0.071 

76 

76 

204 

257 

1584.2 720.8 

12.2 

31.7 

5 

3.844 

3.845 

0.001 

0.102 

61 

61 

255 

186 

1377.5 1262.3 

12.4 

35.5 

10 

3.844 

3.846 

0.000 

3.169 

61 

61 

196 

81 

1377.5 2 073.7 

12.4 

29.7 

20 

3.844 

3.850 

0.001 

3.866 

61 

61 

138 

61 

1377.5 249.0 

12.2 

45.6 

30 

3.844 

3.850 

0.001 

3.866 

61 

61 

232 

61 

1377.5 249.1 

12.3 

64.8 

50 

3.844 

3.850 

0.001 

3.866 

61 

61 

198 

61 

1377.5 248.8 

12.4 

100.7 

70 

69.575 

3.850 

41.178 66.537 

46 

61 

64414 

61 

1056.2 249.6 

12.9 

158.7 

90 

89.350 

3.850 

47.370 70.315 

42 

61 

60 071 

46 

965.2 249.2 

12.8 

201.1 


Table 8: Results for the DIM ACS USA graph with 24M nodes and 29M edges. 


C Further Experiments 

Tables and contain further Pareto-set experiments. The observed effects essentially follow 
those already discussed for Table 
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D Detailed Running Time Analysis 


The lines 1-3 have a running time in 0{m) and are therefore unproblematic. The condition in 
line 4 can be implemented in 0(1) as following: S and T only grow. We can therefore check 
when adding a node to one of the sets, whether it is contained in the other set. If this is the 
case we abort the loop. Outside of the true-branch of the if-statement in line 5 also Sr and 
Tr only grow. We can therefore use the same argument for the condition in line 5. Lines 
6-8 need 0{m) running time each time they are executed. However, they are only executed 
when the flow is augmented. This happens c times. The total running time is thus in 0{cm). 
Showing that the running time of the lines 11-16 is amortized sub-linear is the complex part 
of the analysis. Implementing the growing operations in lines 11 and 16 the naive way needs 
linear running time and is therefore too slow. The naive approach looks at all internal nodes 
to determine all outgoing edges. These are needed to determine which are the non-saturated 
edges. However, either the sets only contain a single node x or they were generated by growing 
them and afterwards adding a single additional node y. In either case it is sufficient to look at 
the outgoing edges of x or y because all other outgoing edges must be saturated, as otherwise 
they would have been followed in a previous iteration. Outputting the cut in line 12 causes costs 
linear in the cut size. We account for these when calling the piercing oracle in line 13. However, 
it is non-trivial that we can list all edges in the cut in linear time. We do this by maintaining 
two additional edge sets Cr and Cr- The source side cut is in Cr and the target side cut is in 
Ct- We only describe how to maintain Cr. The algorithm for Cr is analogous. Each time we 
grow S and the graph search algorithm encounters a saturated edge e it adds e to Cr. Every cut 
edge is saturated and therefore the desired cut is a subset of Cr. As S never shrinks each edge 
can only be added at most once and therefore these additions have running time costs within 
0{m). In line 12 it is possible that Cr contains edges that are saturated but not part of the cut. 
We hlter those edges by iterating over all edges and removing those for which both end points 
are in S. As each edge can be removed at most once the removal costs are within 0{m). The 
remaining edges are the cut. We account for the running needed to skip the cut edges during the 
hlter step when calling the piercing oracle in line 13. The lines 14-15 have a constant running 
time. It remains to show that all the calls to the piercing oracle in line 13 in total do not need 
more than 0{cm) running time. The key observation here is that each time that the oracle is 
called it names a piercing arc e. The next time the oracle is called e is no longer part of the 
cut and therefore the oracle can no longer return e. Each edge is therefore only at most in one 
iteration the piercing arc. The oracle is therefore called at most m times. Each time it has a 
running time linear in the cut size. We can bound the cut size of each step by the hnal cut size 
c as the cut sizes only increases. The total running time spent in the piercing oracle is therefore 
bound by 0 {cm). 
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